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FIELD OF THE INVENTION 

[001] The present disclosure generally relates to analyzing the relationships 
between messages sent between nodes. 
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BACKGROUND 

[002] Many commercially-important systems, especially Web-based 
applications, are composed of a number of communicating components. These systems 
are often structured as distributed systems, with components running on different 
10 processors or in different processes. For example, a multi-tiered system may process 
requests from Web clients that flow through a Web-server front-end and then to a Web 
application server. The application server may then call a database server, for example, or 
other types of services such as for authentication, name service, credit-card authorization, 
or customer relationship management or other support functions. 

1 5 [003] Distributed systems can be difficult to debug, especially when users 

experience poor performance. Diagnosing performance issues is even more difficult in 
distributed systems if the constituent components are composed of “black-box” 
components. For example, some distributed systems may be constructed from software 
from many different, and perhaps competing, vendors, and the source code of the different 
20 components may be unavailable. Without more than a high-level understanding of the 
functions provided by the various components, and without the information that could be 
learned from examination of the source code, selecting a component to begin investigating 
may involve guesswork and result in wasted time. 

[004] The business model under which distributed systems are sold and 
25 deployed also contributes to the difficulties associated with addressing performance 

problems. Enterprises often buy complex systems as complete, customized packages from 
solutions vendors. Solutions vendors may be pressured to deliver complex component- 
based systems without the expense of highly-skilled, experienced programmers. While 
modestly-skilled programmers can design and construct such systems, they may lack the 
30 expertise to debug performance problems efficiently. Vendors of individual components 
may provide training and support for solving performance problems within the 
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components, but not necessarily support for solving performance problems when 
components from other vendors are involved. Thus, whole-system performance 
debugging may require either an inordinate amount of time or the services of expensive 
and hard-to-find systems integration experts. The present invention may address one or 
5 more of the above issues. 

SUMMARY 

[005] The various embodiments of the invention support determining causal 
relations between a plurality of intercommunicating nodes. Communications between the 
10 nodes may be described by input trace data. The trace data may include for each message 
sent between nodes a timestamp that indicates a time at which the message was sent, a 
source identifier that identifies a node from which the message was sent, and a destination 
identifier that identifies a node to which the message was sent. For each of one or more 
nodes, a determination is made as to whether one or more causal relations exist between a 
1 5 first set of messages destined to the node and a second set of messages sourced from the 
node and destined to at least one other node. A causal relation may exist as a function of a 
probability distribution of delay values that are differences between timestamps of 
messages in the second set and timestamps of messages in the first set. From the nodes 
and causal relations a processor-readable representation is generated. 

20 [006] It will be appreciated that various other embodiments are set forth in the 

Detailed Description and Claims which follow. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[007] FIG. 1 is a block diagram of an example multi-tier data processing 
25 arrangement in which an example causal path is illustrated in accordance with 

embodiments of the invention; 

[008] FIG. 2 is a graph of the example causal path from FIG. 1 ; 

[009] FIG. 3A illustrates an example set of trace messages including two 
example subsets; 

30 [0010] FIG. 3B illustrates part of a graph with an edge representing the causal 

relation inferred from FIG. 3A in accordance with embodiments of the invention; 

[001 1 ] FIG. 4 illustrates an example process for generating a graph in 
accordance with various embodiments; 
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[0012] FIG. 5 illustrates an example process for processing a node in accordance 
with embodiments of the invention; 

[0013] FIG. 6 illustrates an example process for inferring causation between sets 
of messages destined to a node and sets of messages sourced from the node in accordance 
5 with embodiments of the invention; 

[0014] FIG. 7 illustrates an example process for finding a correlation between 
messages destined to a node and sets of messages sourced from the node in accordance 
with embodiments of the invention; 

[0015] FIGs. 8A and 8B illustrate indicator functions that signal occurrences of 
10 messages versus time for messages destined to a node and sets of messages sourced from 
the node respectively; 

[0016] FIG. 8C contains a graph of a cross-correlation function of the indicator 
functions of FIGs. 8 A and 8B. 

[0017] FIG. 9 illustrates an alternative example process for inferring causation 
15 between sets of messages destined to a node and sets of messages sourced from the node 
in accordance with embodiments of the invention; 

[0018] FIG. 10 illustrates an example process invoked by the process of FIG. 9 
for selecting a relevant set of destination nodes based on a cross-correlation between two 
sets of messages in accordance wjth embodiments of the invention; 

20 [0019] FIG. 11 illustrates another alternative example process for inferring 

causation between sets of messages destined to a node and sets of messages sourced from 
the node in accordance with embodiments of the invention; 

[0020] FIG. 12 illustrates another alternative example process for inferring 
causation between sets of messages destined to a node and sets of messages sourced from 
25 the node in accordance with embodiments of the invention; and 

[0021] FIG. 1 3 illustrates other embodiments for selecting sets of relevant 
destination nodes in accordance with embodiments of the invention. 

DETAILED DESCRIPTION 

30 [0022] The various embodiments described herein infer causal relations of 

messages sent between nodes, where the causal relations may assist in locating 
performance problems. The nodes may represent host computer systems in a distributed 
data processing arrangement, computer processes, threads, active objects, disk drives or 
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various combinations thereof. The inner functions of the constituent nodes need not be 
apparent to infer the causal relations. The nodes and communication relationships may be 
represented as a graph, with vertices in the graph representing the nodes and edges 
between the vertices representing communications between the nodes. 

5 [0023] The causal relations may be inferred from traced communication 

information. The nodes may be determined from the traced communication information 
and may be represented as vertices in a graph. An edge may be added to the graph to 
connect two vertices that represent nodes that communicate, as indicated by the traced 
communication information. A causal relation may be inferred between a source node and 
10 a destination node from delays between messages destined to the source nodes and 

messages from the source node to the destination node. Each inferred causal relation may 
.J be represented as a delay value that is associated with the edge from the source node to the 
destination node. The delay values associated with the edges may indicate potential 
performance problems that merit further investigation. 

1 5 [0024] It will be recognized that the various embodiments do not require trace 

information resulting from the target distributed system implementing remote procedure 
calls (RPCs). Furthermore, the embodiments work with the trace information that is 
available, even though the trace data may be less than exhaustive due to starting and 
stopping of nodes, missed messages, and tracing limitations during peak activity. The 
20 processes also are adapted to handle trace information from nodes in which the reference 
clocks are not synchronized. 

[0025] FIG. 1 is a block diagram of an example multi-tier data processing 
arrangement in which an example causal path is illustrated. The example tiers include 
clients 102a-e, which communicate with tier-3 nodes 106a-c. The tier-3 nodes 
25 communicate with tier-2 nodes 1 lOa-c, and the tier-2 nodes communicate with one another 
and communicate with tier-1 nodes 1 14a-d. The particular functions provided by the 
nodes in the various tiers are application dependent. For example, one or more data 
centers may host web servers that are tier-3 nodes 106a-c, application servers that are tier- 
2 nodes 1 10a and 1 10b, an authentication server that is tier-2 nodes 1 10c, and database 
30 servers that are tier- 1 nodes 1 14a-d. Even though it is not shown, it will be appreciated 
that the network components, such as switches and routers could also be nodes that are 
considered in a causal path. 
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[0026] The solid, directional lines between the nodes represent communication 
activity between the nodes. For example, line 120 represents bi-directional 
communication activity between client 102e and tier-3 node 106c. The communication 
events between nodes may be referred to as messages. From a collection of messages that 
5 are exchanged between the nodes, a causal path may be inferred. For example, dashed 
line 132 illustrates a hypothetical causal path. Inferring a causal path assumes that there is 
some causality between messages sent between different nodes. For example, a message 
directed to a first node may result in one or more messages being sent from the first node 
to one or more other nodes. 

i 

10 [0027] The example causal path 1 32 represents a possible scenario in which 

client 102e sends a message to tier-3 node 106c. In response, tier-3 node 106c sends a 
message to tier-2 node 1 10c, which in turn sends a message to tier-1 node 1 14d. Node 
1 14d responds with a message back to node 1 10c, which returns a message to node 1 06c. 
Node 106c sends a message to tier-2 node 1 10b, which in turn sends a message to tier-1 
15 node 1 14a. Messages are then sent back up the tiers from node 1 14a to node 1 10b, from 
node 1 10b to node 106c, and from node 106c to client 102e. It will be appreciated that 
multiple causal paths may be inferred from an input set of messages, even though only one 
causal path is illustrated in FIG. 1. 

[0028] The input set of messages (message trace) from which causal paths may 
20 be inferred may be compiled from a number of sources, for example, passive network 
monitoring (for communication edges that flow between computers), kernel 
instrumentation, middleware instrumentation, or even application instrumentation. 

[0029] The information of interest in the trace messages includes a timestamp, a 
source identifier, and a destination identifier. The timestamp indicates the time at which 
25 the message was sent, the source identifier indicates the node from which the message was 
sent, and the destination identifier indicates the node to which the message was sent. The 
timestamps of different source nodes (sources for brevity) need not have the same time 
reference. In some distributed systems, the collecting of trace messages may be 
distributed along nodes of the system so that different sources are monitored by different 
30 entities having local clocks. These clocks need not be synchronized with each other, the 
clocks need only have approximately the same, rate, which is the case if they accurately 
measure' intervals of real time. 
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[0030] FIG. 2 is a graph of the example causal path 1 32 from FIG. 1 . Each of 
the vertices is labeled with the reference number of the represented node from FIG. 1 . For 
example, the first vertex in the graph is 102e, which represents client node I02e from FIG. 
1. Edge 142 connects vertex 1 10c to vertex 1 14d and represents a causal relation inferred 
5 between messages arriving at node 1 1 0c and messages from node 1 1 0c to node 1 1 4d. A 
node may be represented by more than one vertex in a graph depending on the inferred 
casual relations. For example, each of nodes 106c, 1 1 0b, and 1 10c is represented with two 
vertices in the graph because two causal relations are inferred for each of the nodes. 

[0031] Whether a causal relation is inferred between messages from a source 
10 node to a destination node and messages arriving at the source node depends on a 

probability distribution of differences between timestamps of messages from the source to 
the destination and timestamps of messages arriving at the source. For example, in one 
embodiment if some number of messages from the source to the destination have 
timestamps indicating approximately equal delays relative to the timestamps of some 
15 messages received by the source, a causal relation is inferred between the source and the 
destination. It will be appreciated that other embodiments may use different probability 
distributions depending application-specific patterns of communication between nodes. In 
one embodiment, the causal relation is represented with an edge in the graph, and the most 
frequent delay is associated with the edge. For example, the edge from vertex 1 14d to 
20 vertex 1 10c has an associated delay denoted by <7#. 

[0032] FIGs. 3A and 3B illustrate the inference of a causal relation between 
nodes from a set of traced messages 160. FIG. 3 A illustrates an example set of trace 
messages, with two example subsets shown in blocks 162 and 164, respectively. Subset 
162 includes trace messages destined to node 1 10b, and subset 164 includes trace 
25 messages from node 1 1 0b and destined to node 114a. 

[0033] The purpose of presenting the example timestamps is to illustrate a causal 
relation that may be inferred from messages destined to a node and messages sourced from 
the node and destined to another node. It will be appreciated that many additional trace 
messages would likely be present in a complete set of trace data. However, y only messages 
30 of interest are shown in order to simplify the figures. 

[0034] From the timestamps it may be observed that each value in subset 164 is 
offset by approximately 3.000 seconds from one of the values in subset 162. Because 
there are a number of corresponding messages in subsets 162 and 164 for which this 
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relation is true, it may be inferred that for at least some messages there is approximately a 
3.000 second delay between the occurrence of a message destined to node 1 10b and the 
occurrence of a message sourced from node 1 10b and destined to node 1 14a. In other 
words, an inference may be made that some messages to node 1 1 0b cause messages to be 
5 generated to node 1 14a with a delay of approximately 3.000 seconds. 

[0035] FIG. 3B illustrates part of a graph having an edge from vertex 1 10b to 
vertex 1 14a. The edge from vertex 1 10b to vertex 1 14a represents a causal relation 
inferred from the subsets 162 and 164 of trace messages. The value 3.000 seconds is 
associated with the edge from vertex to represent the approximate delay found to occur 
10 between the timestamps of subsets 1 62 and 1 64. 

[0036] FIGs. 4-13 describe various embodiments of processes for inferring 
causal relations between nodes and generating a graph that represents the causal relations. 
FIG. 4 illustrates an example process 200 for generating a graph in accordance with 
various embodiments. The process generally considers messages targeted at a given node 
15 and the messages sourced from that node to find the likely candidate nodes for the next 
node in the path. and then recurses for each candidate node. The next candidate nodes are 
found by the aggregation of multiple messages, rather than by examining one event at a 
time. The process seeks trends in this aggregation, while ignoring messages that are 
inconsistent with the trend. 

20 [0037] The output of the process is a graph representation ( output graph ) of the 

causal paths that are inferred. The output graph begins empty (step 202). 7} is the subset 
of trace messages with source node i as determined from the trace of all messages (step 
204). 

[0038] The process begins with an initial node, represented by variable 
25 initial node, and a vertex is added to the output graph (step 206) to represent the initial 

node. The process then considers the messages sourced from the initial_node (represented 
by 7initiai_node)- For each destination node,y, found among these events (step 208), there is 
a causal relation between the initial node and j, and an edge is added from the vertex 
Xinitiai_node to a vertex that represents node j (step 212). This edge is labeled with a zero 
30 delay because, by definition, the initial node does not delay messages. 

[0039] The process then determines how the path continues from node j by 
invoking the ProcessNode process (step 214), inputting node j, vertex x 7 , and V (from 
step 210), which is the set of messages in r m iti a i_node with having a destination of node j. 
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The Process JNode process 250 (FIG. 5) calls the Find_Caused_Messages process 300 
(FIG. 6) with inputs V and 7} to find the subset of messages of 7} that are caused by the 
messages of V (step 252). Rather than return all the caused messages in a single group, 
the Find_Caused_Messages process splits those messages into multiple subsets. More 
5 precisely, Find_Caused_Messages returns a list, 0\, O 2 , ... O m , where each O, represents 
the caused messages events in Tj that have a common delay and destination node. Each 
element O, in this list has three fields: Oj.messages contains messages of '/}, O,. delay is 
the common delay for the messages in Oj.messages, and O,.node is the common 
destination node for the messages in Oj.events. Then, for each such 0„ Process Node 
1 0 creates a new vertex for Oj.node, and adds an edge to that vertex from xj labeled Oj.delay 

(steps 256, 258). Finally, Process_Node calls itself recursively to find out how the path 
continues from Oj.node (step 260). 

[0040] FIG. 6 illustrates an example process 300 (Find_Caused_Messages) for 
inferring causation between sets of messages destined to a node and sets of messages 
15 sourced from the node. The input to the process includes sets of messages V and Z. All 
the messages in Fhave the same destination node,y, and all messages of Z have j as the 
source node. The Find_Caused_Events process may be understood by way of example. 

[004 1 ] If for some duration of time d ( causal delay), there are some number of 
messages in Z that appear exactly after time d of some message in V, then it may be 
20 inferred that those messages in Z are caused by those messages in V. The process attempts 

to find these causal delays (steps 304, 306), and for each causal delay found (step 308) the 
process finds the set Zo of events in Z corresponding to events in V time shifted by d (step 
310). The time shift may be forward (d >= 0) or backward (d <= 0). A backward shift 
may occur if, during the gathering of the trace, there was lack of synchronization among 
25 clocks of different nodes. The process separates the messages in Zo by destination node: 
for each such node, a new list element O, is created (steps 3 1 2, 3 14, 3 1 6). 

[0042] The Find Caused E vents process invokes the Find Correlation process 
350 of FIG. 7 to find a cross-correlation between messages destined to a node and subsets 
of messages sourced from the node. The process begins by converting input message sets 
30 V and Z into respective indicator functions s\(t) and s 2 (0> respectively (steps 352, 354). In 
ope embodiment, indicator function s\(t) is defined as: 
si(/) = 1 if V has a message at time [/ - e, / + e] 

0 otherwise 
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where e is a selected small, fixed constant. The events of Z are similarly converted into 
indicator function .s 2 (f). FIGs. 8A and 8B illustrate example graphs of indicator functions. 

[0043] If a causal delay d exists, then s 2 (0 will include a copy of .S|(/) shifted in 
time by d. Any causal delay(s) between s\(t) and .s 2 (/) is characterized by computing a 
5 cross-correlation function (step 356). In one embodiment, the cross-correlation function 
C{t) is the convolution of s 2 and the time inverse of si. In general terms, C(r) will have a 
spike at d if and only if si{t) contains a copy of S|(f) time shifted by d. 

[0044] It will be appreciated that the time inverse of a function s(t ) is s{-t). The 
convolution of two functions^) and g(/) is another function denoted which is 

10 defined as: 

+oo 

f®g(t)= j \f(u)g(t-u)du. 

-OO 

[0045] The discrete version of the convolution is: 

+00 

= YufjSi-j- 

j=-co 

15 [0046] If f is replaced with s 2 (t) and g is. replaced with S|(t), the following 

formula results and is used for C(t) in one embodiment: 

+00 

C(/)= js l (u-t)s 2 (u)du. 

-go 

[0047] The discrete version is: 

+CO 

O = T ( ,S '| ) j-i ( ,y 2 ) ; • 

j--x 

20 [0048] Returning now to the Find_Caused_Messages process 300 of FIG. 6, 

once the cross-correlation function is determined (step 304), the process finds the positions 
of the spikes in the cross-correlation function (step 308). In an example embodiment, the 
spikes are determined by computing the mean and standard deviations of C, and defining a 
spike to exist if it is N\ standard deviations above the mean, where N\ is a small fixed 
25 constant, for example, 1. The spike may gradually increase before reaching the peak. To 
avoid multiple detections of the same spike in nearby points in one embodiment, a spike 
must fall below N 0 standard deviations before another spike is detected, where N 0 <N\ and 
is another small constant value, for example, 0.8. Other methods of finding spikes in a 
function may also be suitable. 



10 




200310090-1 

[0049] Each spike may yield many candidate points. These candidate points 
include points that are N\ standard deviations above the mean to a point No deviations 
below the mean. Among these candidate points, the point with the largest value is selected 
to represent the spike. Because the position of the spike is the value of interest, the value 
5 d such that C(d) is a spike is sought. 

[0050] FIGs. 8A and 8B illustrate indicator functions that signal occurrences of 
messages versus time for messages destined to a node and sets of messages sourced from 
the node respectively. FIG. 8C contains a graph of a cross-correlation function of the 
indicator functions of FIGs. 8A and 8B. FIG. 8A illustrates the indicator function 
10 and FIG. 8B illustrates the indicator function s 2 (/). It may be observed that 5 2 (0 includes a 
version of s\(t) that is time shifted by d. 

[0051] The cross-correlation function of s 2 (r) and i'i(r) is shown in FIG. 8C. The 
graph illustrates that at a delay of d, there is a spike. The spike shows that some number 
of messages in Z are time shifted by d relative to a corresponding number of messages in 
15 V. ’ 

[0052] Also, for a given time f, C{f ) approximates the amount of overlap 
between si and S 2 when S] is shifted by f. Overlap refers to the times at which both s\ 
shifted by t ’ and .? 2 are non-zero. For example, in FIG. 8C, if /’=0 then there is no overlap 
because there are no times when both s 2 and £| are non-zero. If f is a small value then 
20 hump 372 of s 2 (FIG. 8B) will overlap with hump 374 of si (FIG. 8A) because when s\ is 
shifted right by a slight amount, there will be times when both S 2 and s\ are simultaneously 
non-zero. This overlap is maximum when t = d, which is where all humps of s\ overlap 
with all hump of s 2 . 

[0053] FIG- 9 is illustrates another embodiment of a Find_Caused_Messsages 
25 process 400. Relative to the Find Caused Messages process 300 of FIG. 6, the 

Find_Caused_Messsages process 400 of FIG. 9 first splits the set of messages Z into 
smaller subsets based on destination node, and then finds the cross-correlation between V 
and each of the smaller subsets. 

[0054] The Find_Related_Nodes process 450 (FIG. 10) is invoked (step 404) to 
30 return a set of destination nodes that might be of interest. Because Z may have a very 

large number of destination nodes, the overall process may be slow. Therefore, rather than 
processing every destination node, a subset of destination nodes is selected based on the 
likelihood of being relevant. 
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[0055] For each destination node j of interest (step 406), the process selects a 
subset Zo of messages from Z with j as the destination node (step 408). The 
FindCorrelation process 350 is then invoked with V and Zo (step 410), and the positions 
of the spikes in the cross-correlation function are found (step 412). For each spike 
5 position d (step 414), a subset Z\ of messages is constructed to include messages in Zo 
having timestamps equal to timestamps of messages in V shifted by d. A new list element 
Oj is created for the destination node, delay value, and subset of messages (step 420). 

[0056] FIG. 10 illustrates an example FindRelatedNodes process 450 for 
selecting a relevant set of destination nodes based on a cross-correlation between two sets 
10 of messages. The process is similar to the FindCausedJMessage process 300 of FIG. 6. 
The FindRelatedNodes process finds a cross-correlation function (step 454) and finds 
spikes in the cross-correlation function (step 456). For each spike position d (step 458), 
the process gets the messages in Zhaving timestamps equal to timestamps in Z shifted by 
d (step 460) and collects in the Nodes set the destination nodes of the messages (step 464). 
1 5 [0057] FIG. 1 1 illustrates yet another embodiment of a Find Caused Messsages 

process 500. The accuracy of the FindCausedMessages process may be further 
improved by searching for only one spike at a time rather than multiple spikes. The 
FindCausedMessages process 500 searches for only the position d of the largest spike 
(the maximum value of C(t)). 

20 [0058] The Find_Caused_Messages process 500 begins (steps 502, 504, 506, 

508) in a manner similar to the FindCausedMessages process 400 (FIG. 9). The process 
then tracks a set V 0 of messages, which is initially set to V (step 510). When the number 
of messages in the smaller of Vq and Zo reaches less than a selected minimum size (for 
example, 20 or 30 messages) While loop (step 5 12) is exited (step 514) and the list 0\, Oj, 
25 . . ., Oj is returned. The process finds a cross-correlation function for Vo and Zo (step 5 1 6). 

If the maximum value of the cross-correlation function is not prominent, the loop is exited 
(step 518). A prominent value may be defined to be that the value is a selected number of 
standard deviations, for example 2 or 3, above the mean. 

[0059] The process then establishes d as the position of the maximum of the 
30 cross-correlation function (step 520). Z\ is assigned the subset of messages in Z 0 having 
timestamps equal to timestamps of messages Vo in shifted by d (step 522). V\ is assigned 
the subset of messages in Vo having timestamps equal to timestamps of messages Z] in 
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shifted by -d (step 524). The list element (2, is then updated (step 526), and messages in 
V\ and Z\ are removed from sets Vo and Zo, respectively (step 528). 

[0060] FIG. 12 illustrates yet another embodiment of a Find Caused Messsages 
process 550. The previously described Find_Caused_Messages processes 300, 400, and 
5 500 may be improved further to address the following issue. Suppose that when a 

message arrives at a node j, node j delays by either d\ or d% with probability of Vi each, 
before generating a message to a destination node. Then only about half as many 
messages that arrive in node j leave node j for each delay d\ or d 2 . Thus, when the 
Find_Caused_Messages process reaches node j, it will tend to return subsets O, that have 
10 half as many events as in V. If there are many nodes similar to j, then as the process 

progresses further in the causal path, there will be fewer and fewer messages available for 
consideration. With fewer messages to consider, accuracy may be impaired. 

[0061] A second issue is that the output graph for node j will have two outgoing 
edges, one for d\ and another for d 2 . If there are many nodes like j, an exponential 
15 increase in the number of nodes may result, which will likely clutter the output graph. 

[0062] The Find_Caused_Messages process 550 addresses these issues by 
merging together the sets Zi for different delays (steps 562 and 580) and then creating a 
single Oj for all delays. It will be appreciated that the definition of <9, is slightly modified 
to encompass a set of delays rather than a single delay (steps 564 and 586). The 
20 consequence is that each edge of the graph will be labeled with a set of one or more delay 
values rather than a single delay value. It can be seen that the remaining steps of the 
Find Caused Messages process 550 are the same as corresponding steps in the 
Find_Caused Messages process 500 (FIG. 1 1 ). 

[0063] FIG. 13 illustrates another embodiment for selecting sets of relevant 
25 destination nodes according to a Find Related_Nodes process 600. It will be appreciated 
the teachings of the embodiment of the Find_Caused_Messages process 500 (FIG. 11) 
may be adapted to improve the Find_Related_Nodes process as illustrated by process 600. 
A compromise between speed and accuracy may be obtained by finding multiple 
destination nodes that occur frequently rather than finding just one frequently occurring 
30 destination node. 

[0064] The set of Nodes to be returned begins as an empty set (step 602), and Vq- 
and Zo are initialized to the input sets V and Z, respectively. A processing loop (step 606) 
is performed until certain exit conditions occur. The processing of steps 608, 610, 612, 
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614, and 616 is similar to the processing described for steps 514, 516, 518, 520, and 522 in 
the Find Caused Messages process 500 (FIG. 11). 

[0065] Node / is established as the node that is the most frequent destination in 
the Z| subset of messages (step 618), Node / is then added to the output set of Nodes (step 
5 620), and the Zi subset of messages is assigned the messages in the Z\ subset of messages 

having as the destination node i (step 622). A V 2 subset of messages is established as the 
messages in Vo having timestamps equal to timestamps of messages in Z 2 shifted by -d 
(step 626). Messages in V 2 and Z 2 are then removed from sets Vo and Zo, respectively (step 
626). Upon exit from the loop, the set of Nodes is returned (step 628). 

10 [0066] The following discussion describes various additional alternative 

embodiments. In one alternative embodiment, discrete indicator functions are provided to 
function with discrete variables rather than continuous variables in order to improve 
processing efficiency. A time quantum, p, is chosen and t is treated as an integer that 
represents multiples of p. The various parts of the processes that use / may be modified 
1 5 accordingly. The indicator functions s\(t) and s 2 (t) are redefined as follows: 
s\(t) = 1 if V has a message during times [tp, (t+l)p] 

0 otherwise 
where t is an integer. 

[0067] The si indicator function may be alternatively defined as: 

20 s i(t) = number of messages in V during times [/p, (/ + 1 )p] 

where t is an integer. Yet another alternative is to define the si indicator function as: 
s\(t ) = square root of the number of messages in 
V during times [/p, ( t + l)p] 

where t is an integer. 

25 [0068] It will be appreciated that the s 2 indicator function may be similarly 

defined using Z instead of V as the message set. For either alternative, the indicator 
functions may be represented by arrays in which each cell in the array represents a time 
quantum and the value of the cell represents the number of messages in the time quantum. 
[0069] Another embodiment addresses the situation where the delays are not 
30 uniform. The method involves convolving the indicator functions with an arbitrary curve 
that may model the variance of delays. For example, the curve could be a Gaussian 
function or a triangle centered at the origin. This change may be implemented after step 
354 in the Find Correlation process 350 (FIG. 7), by adding the following code: 
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51 := convolution^!, curve) 

5 2 := convolution^, curve) 

[0070] If the curve is a triangle or a Gaussian, the width of the triangle or the 
variance of the Gaussian can be input parameters to the process, and the chosen value may 
depend on the maximum variance expected for the delays. Application of this process 
may smooth spikes in the indicator functions with the expectation that there will be spikes 
in the cross-correlation of ^i(t) and S 2 (t) even when there are variations in the delay. 

[0071] Variances in the delay may also be addressed by considering messages 
having delays that are close to one another when collecting subsets of messages in the 
Find_Caused_Messages processes 300, 400, and 400. For example, various steps in these 
processes are similar to: 

X := messages in Y having timestamps equal to timestamps of messages in 
Z shifted by d 

where X, 7, and Z vary according to the particular version of and location within the 
process. The respective steps may be modified as: 

X := messages in Y having timestamps within a of timestamps of messages 
in Z shifted by d 

where a is a parameter with a value selected according to the magnitude of the delay 
variations. 

[0072] In various other embodiments, the processes are adapted to address 
undesirable causal paths that may be found. That is, some causal paths may be inferred 
from the trace messages when in fact there is no causality. This may be problematic if the 
number of undesirable paths is so large that the useful paths are obscured. One way to 
deal with undesirable paths is to remove paths that are caused by a small number of 
messages. An alternative approach is to weight the edges based on the number of 
messages from which a causal relation is inferred. 

[0073] To avoid undesirable paths, the Process_Node process 250 (FIG. 5) may 
be modified to avoid processing subsets O, that have too few events. This may be 
accomplished by cond itioning on the number of messages the creation of a vertex and the 
recursive invocation of the Process Node process 250 (FIG. 5). Specifically, steps 258 
and 260 of the Process_Node process 250 may be changed as follows: 
if | W\ > MinSize then 

add a new vertex xu labeled k and edge (x h x k ) labeled d in the output graph 
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ProcessNode ( k , x^, IV) _ 

where MinSize is a constant, for example, in the range of 20 to 30. 

[0074] In another embodiment, the Process_Node process 250 may be adapted 
to prevent formation of paths that may be undesirably short. For example, if the traced 
5 messages indicate a causal path then the causal path A — » B — > A is 

also likely to be detected because there are messages from B to A after some reasonably 
regular delay of the messages from A to B. This situation may be addressed by discarding 
edges that immediately return to the previous node in the path. However, it is desired for 
the process to still allow, in the example above, the edge C —> B. Thus, such an edge is 
10 allowed to return to the previous node only if there are no other edges leaving the current 
node. The Process_Node process 250 may be adapted to take an extra input, prev_node: 
Process_Node(prev_node, j, Xj, V) 

Step 260 is modified to pass this extra parameter to Process_Node: 

Process_Node(j, k, X|<, W) 

15 Step 214 is similarly modified: 

Process_Node(dummy, j, Xj, V) 

where dummy is the name of a node that does not exist, since there is no previous node. 
Lastly, steps 258 and 260 of the Process_Node process are modified as follows: 
if (k f prev_node) or 

20 {(),. node is equal to kfor every i) then 

create vertex x* labeled k and edge (x ; , x*) labeled d in the output 
graph 

Process_Node (k, xt, W) \ 

[0075] Negative time shifts may also be found if the cross-correlation function C 
25 has a spike at a negative value of t. This may be a result of statistical chance or different 
time references in the timestamps and may result in inaccuracies in the output graph. If it 
is known that the time references were synchronized during collection of the trace 
messages, then negative time shifts may be ignored because the negative values will be 
due to statistical chance. The Process_Node process 250 may be modified to condition 
30 execution of steps 258 and 260 on d> 0. That is, if d> 0 then steps 258 and 260 are 

executed. 

[0076] The output graph may have cycles that repeat many times if the trace 
messages exhibit periodic behaviors. For example, suppose that there is a message from A 
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to B every 1 second for 1 hour, and similarly from B to C and from C to A. Then the 
process may infer that the messages from A to B are causing the messages from B to C, 
which in turn are causing the messages from C to A, which cause events from B to A, and 
so on. The output graph would have a long cycle: A^B—>C^>A—>B—*C... This 
5 output is undesirable because it might obscure useful information or cause the process to 
execute for a very long time since the run-time is a function of the size of the output graph. 

[0077] An example embodiment to address this situation is to impose a 
maximum number of times, M, that a node can repeat itself in a path, where M is a small 
constant, for example, 4 or 5. This may be implemented by adapting the Process_Node 
10 process 250 to condition execution of steps 258 and 260 on whether node k has appeared 
fewer than M times in the path from Xi n itiai_node to xj. 

[0078] In another embodiment, weights may be associated with edges so that 
relevant information may be later separated from irrelevant information. This may be 
implemented by changing step 258 of the Process_Node process 250 to add a weight to the 
15 edge. The weight may be either a function of the number of messages in W or a function 
of the quality of the spike that originated the edge. “Quality” may be defined in a variety 
of ways, for example, by the number of standard deviations separating the mean and the 
spike. The weight may alternatively be a function of both the number of messages in W 
and the quality of the spike. 

20 [0079] Those skilled in the art will appreciate that various alternative data 

processing and or processor arrangements would be suitable for hosting the processes and 
data structures of the different embodiments of the present invention. In addition, the 
processes may be provided via a variety of computer-readable media or delivery channels 
such as magnetic or optical disks or tapes, electronic storage devices, or as application 
25 services over a network. 

[0080] The present invention is believed to be applicable to a variety of systems 
for performance analysis, and is believed to be beneficial in analyzing the performance of 
distributed systems. Other aspects and embodiments of the present invention will be 
apparent to those skilled in the art from consideration of the specification and practice of 
30 the invention disclosed herein. It is intended that the specification and illustrated 

embodiments be considered as examples only, with a true scope and spirit of the invention 
being indicated by the following claims. 
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